Combining resources for MWE-token classification
نویسندگان
چکیده
We study the task of automatically disambiguating word combinations such as jump the gun which are ambiguous between a literal and MWE interpretation, focusing on the utility of type-level features from an MWE lexicon for the disambiguation task. To this end we combine gold-standard idiomaticity of tokens in the OpenMWE corpus with MWE-type-level information drawn from the recently-published JDMWE lexicon. We find that constituent modifiability in an MWE-type is more predictive of the idiomaticity of its tokens than other constituent characteristics such as semantic class or part of speech.
منابع مشابه
Verb Noun Construction MWE Token Classification
We address the problem of classifying multiword expression tokens in running text. We focus our study on Verb-Noun Constructions (VNC) that vary in their idiomaticity depending on context. VNC tokens are classified as either idiomatic or literal. We present a supervised learning approach to the problem. We experiment with different features. Our approach yields the best results to date on MWE c...
متن کاملFleshing it out: A Supervised Approach to MWE-token and MWE-type Classification
Although some multiword expressions (MWEs) like How do you do? have exclusively idiomatic meaning, other MWEtypes like the phrase kick the bucket may be idiomatic or literal depending on context. The recently developed OpenMWE corpus provides the largest freely available collection of annotated MWE-tokens suitable for supervised classification, but so far its potential has only been superficial...
متن کاملjMWE: A Java Toolkit for Detecting Multi-Word Expressions
jMWE is a Java library for implementing and testing algorithms that detect Multi-Word Expression (MWE) tokens in text. It provides (1) a detector API, including implementations of several detectors, (2) facilities for constructing indices of MWE types that may be used by the detectors, and (3) a testing framework for measuring the performance of a MWE detector. The software is available for fre...
متن کاملConstruction of an English Dependency Corpus incorporating Compound Function Words
The recognition of multiword expressions (MWEs) in a sentence is important for such linguistic analyses as syntactic and semantic parsing, because it is known that combining an MWE into a single token improves accuracy for various NLP tasks, such as dependency parsing and constituency parsing. However, MWEs are not annotated in Penn Treebank. Furthermore, when converting word-based dependency t...
متن کاملDeep Lexical Segmentation and Syntactic Parsing in the Easy-First Dependency Framework
We explore the consequences of representing token segmentations as hierarchical structures (trees) for the task of Multiword Expression (MWE) recognition, in isolation or in combination with dependency parsing. We propose a novel representation of token segmentation as trees on tokens, resembling dependency trees. Given this new representation, we present and evaluate two different architecture...
متن کامل